83 research outputs found
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments
Summary descriptions of subroutines are short (usually one-sentence) natural
language explanations of a subroutine's behavior and purpose in a program.
These summaries are ubiquitous in documentation, and many tools such as
JavaDocs and Doxygen generate documentation built around them. And yet,
extracting summaries from unstructured source code repositories remains a
difficult research problem -- it is very difficult to generate clean structured
documentation unless the summaries are annotated by programmers. This becomes a
problem in large repositories of legacy code, since it is cost prohibitive to
retroactively annotate summaries in dozens or hundreds of old programs.
Likewise, it is a problem for creators of automatic documentation generation
algorithms, since these algorithms usually must learn from large annotated
datasets, which do not exist for many programming languages. In this paper, we
present a semi-automated approach via crowdsourcing and a fully-automated
approach for annotating summaries from unstructured code comments. We present
experiments validating the approaches, and provide recommendations and cost
estimates for automatically annotating large repositories.Comment: 10 pages, plus references. Accepted for publication in the 27th IEEE
International Conference on. Software Analysis, Evolution and Reengineering
London, Ontario, Canada, February 18-21, 202
DNAV: A WebGL Based Tool for Visualizing the Twists and Turns in the Human Genome
The human genome is tightly folded to fit within the restricted space of the nucleus. One of the key goals in understanding the folding principles of DNA is to unravel the mysteries of how functional elements that are separated from each other are brought together. Long-range interactions between folded segments of chromosomes form complex three-dimensional networks and are fundamental in controlling gene expression. These long-range interactions have been observed using chromosome conformation capture (3C). This Hi-C data contains a wealth of information on the nearest-neighbor influence on the deviation of the DNA axis that can
be modeled theoretically. We have developed a tool using WebGL to visualize the modeled structures
Exact expectation values of local fields in quantum sine-Gordon model
We propose an explicit expression for vacuum expectation values of the
exponential fields in the sine-Gordon model. Our expression agrees both with
semi-classical results in the sine-Gordon theory and with perturbative
calculations in the Massive Thirring model. We use this expression to make new
predictions about the large-distance asymptotic form of the two-point
correlation function in the XXZ spin chain.Comment: 18 pages, harvmac.tex, 2 figure
Who changes the string coupling ?
In general bosonic closed string backgrounds the ghost-dilaton is not the
only state in the semi-relative BRST cohomology that can change the
dimensionless string coupling. This fact is used to establish complete dilaton
theorems in closed string field theory. The ghost-dilaton, however, is the
crucial state: for backgrounds where it becomes BRST trivial we prove that the
string coupling becomes an unobservable parameter of the string action. For
backgrounds where the matter CFT includes free uncompactified bosons we
introduce a refined BRST problem by including the zero-modes "x" of the bosons
as legal operators on the complex. We argue that string field theory can be
defined on this enlarged complex and that its BRST cohomology captures
accurately the notion of a string background. In this complex the ghost-dilaton
appears to be the only BRST-physical state changing the string coupling.Comment: 34 pages, phyzz
- …